Extracting compound terms from domain corpora
نویسندگان
چکیده
منابع مشابه
Extracting Paraphrases of Technical Terms from Noisy Parallel Software Corpora
In this paper, we study the problem of extracting technical paraphrases from a parallel software corpus, namely, a collection of duplicate bug reports. Paraphrase acquisition is a fundamental task in the emerging area of text mining for software engineering. Existing paraphrase extraction methods are not entirely suitable here due to the noisy nature of bug reports. We propose a number of techn...
متن کاملExtracting word lists for domain-specific implicit opinions from corpora
Sentiment analysis relies to a large extent on lexical resources. While lists of words bearing a contextindependent evaluative polarity (‘great’, ‘bad’) are available for many languages now, the automatic extraction of domain-specific evaluative vocabulary still needs attention. This holds especially for implicit opinions or so-called polar facts. In our work, we focus on German and on a genre ...
متن کاملBilingual lexicon extraction from comparable corpora using in-domain terms
Many existing methods for bilingual lexicon learning from comparable corpora are based on similarity of context vectors. These methods suffer from noisy vectors that greatly affect their accuracy. We introduce a method for filtering this noise allowing highly accurate learning of bilingual lexicons. Our method is based on the notion of in-domain terms which can be thought of as the most importa...
متن کاملA Domain Independent Approach for Extracting Terms from Research Papers
We study the problem of extracting terms from research papers, which is an important step towards building knowledge graphs in research domain. Existing terminology extraction approaches are mostly domain dependent. They use domain specific linguistic rules, supervised machine learning techniques or a combination of the two to extract the terms. Using domain knowledge requires much human effort...
متن کاملExtracting Paraphrases from Aligned Corpora
The Problem: The expressiveness of human language allows people to express the same idea in many different ways; they may use different words to refer to the same entity or employ different phrases to describe the same concept. Thus, an effective information retrieval (IR) and question answering (QA) system must be equipped to handle these variations, both when processing documents and when fie...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of the Brazilian Computer Society
سال: 2010
ISSN: 0104-6500,1678-4804
DOI: 10.1007/s13173-010-0020-4